智能论文笔记

Document-aware Positional Encoding and Linguistic-guided Encoding for Abstractive Multi-document Summarization

Congbo Ma , Wei Emma Zhang , Pitawelayalage Dasun Dileepa Pitawela , Yutong Qu , Haojie Zhuang , Hu Wang

分类：自然语言处理

2022-09-13

多文件摘要中的一个关键挑战是捕获区分单个文档摘要（SDS）和多文件摘要（MDS）的输入文档之间的关系。现有的MDS工作很少解决此问题。一种有效的方法是编码文档位置信息，以帮助模型捕获跨文档关系。但是，现有的MDS模型（例如基于变压器的模型）仅考虑令牌级的位置信息。此外，这些模型无法捕获句子的语言结构，这不可避免地会引起生成的摘要中的混乱。因此，在本文中，我们提出了可以与MDS的变压器体系结构融合的文档意识到的位置编码和语言引导的编码。对于文档感知的位置编码，我们引入了一项通用协议，以指导文档编码功能的选择。对于语言引导的编码，我们建议使用简单但有效的非线性编码学习者进行特征学习，将句法依赖关系嵌入依赖关系掩码中。广泛的实验表明，所提出的模型可以生成高质量的摘要。

translated by 谷歌翻译

Knowledge-aware Document Summarization: A Survey of Knowledge, Embedding Methods and Architectures

Yutong Qu , Wei Emma Zhang , Jian Yang , Lingfei Wu , Jia Wu

分类：自然语言处理 | 人工智能 | 机器学习

2022-04-24

在过去的几十年中，知识感知的方法增强了一系列自然语言处理应用。随着收集的动力，最近在文档摘要中引起了知识，这是自然语言处理应用之一。先前的作品报告说，知识包裹的文档摘要在产生卓越的消化方面表现出色，尤其是在信息性，连贯性和事实一致性方面。本文追求对将知识嵌入文档摘要的最先进方法论进行的首次系统调查。特别是，我们提出了新的分类法，以概括文档摘要观点下的知识和知识嵌入。我们进一步探讨了如何在嵌入文档摘要模型的学习体系结构时，尤其是深度学习模型的学习架构。最后，我们讨论了这个主题和未来方向的挑战。

translated by 谷歌翻译

A Dynamics Theory of Implicit Regularization in Deep Low-Rank Matrix Factorization

Jian Cao , Chen Qian , Yihui Huang , Dicheng Chen , Yuncheng Gao , Jiyang Dong , Di Guo , Xiaobo Qu

分类：机器学习

2022-12-29

Implicit regularization is an important way to interpret neural networks. Recent theory starts to explain implicit regularization with the model of deep matrix factorization (DMF) and analyze the trajectory of discrete gradient dynamics in the optimization process. These discrete gradient dynamics are relatively small but not infinitesimal, thus fitting well with the practical implementation of neural networks. Currently, discrete gradient dynamics analysis has been successfully applied to shallow networks but encounters the difficulty of complex computation for deep networks. In this work, we introduce another discrete gradient dynamics approach to explain implicit regularization, i.e. landscape analysis. It mainly focuses on gradient regions, such as saddle points and local minima. We theoretically establish the connection between saddle point escaping (SPE) stages and the matrix rank in DMF. We prove that, for a rank-R matrix reconstruction, DMF will converge to a second-order critical point after R stages of SPE. This conclusion is further experimentally verified on a low-rank matrix reconstruction problem. This work provides a new theory to analyze implicit regularization in deep learning.

translated by 谷歌翻译

Infusing Definiteness into Randomness: Rethinking Composition Styles for Deep Image Matting

Zixuan Ye , Yutong Dai , Chaoyi Hong , Zhiguo Cao , Hao Lu

分类：计算机视觉

2022-12-27

We study the composition style in deep image matting, a notion that characterizes a data generation flow on how to exploit limited foregrounds and random backgrounds to form a training dataset. Prior art executes this flow in a completely random manner by simply going through the foreground pool or by optionally combining two foregrounds before foreground-background composition. In this work, we first show that naive foreground combination can be problematic and therefore derive an alternative formulation to reasonably combine foregrounds. Our second contribution is an observation that matting performance can benefit from a certain occurrence frequency of combined foregrounds and their associated source foregrounds during training. Inspired by this, we introduce a novel composition style that binds the source and combined foregrounds in a definite triplet. In addition, we also find that different orders of foreground combination lead to different foreground patterns, which further inspires a quadruplet-based composition style. Results under controlled experiments on four matting baselines show that our composition styles outperform existing ones and invite consistent performance improvement on both composited and real-world datasets. Code is available at: https://github.com/coconuthust/composition_styles

translated by 谷歌翻译

Principled and Efficient Transfer Learning of Deep Models via Neural Collapse

Xiao Li , Sheng Liu , Jinxin Zhou , Xinyu Lu , Carlos Fernandez-Granda , Zhihui Zhu , Qing Qu

分类：机器学习 | 人工智能 | 计算机视觉 | (统计)机器学习

2022-12-23

With the ever-growing model size and the limited availability of labeled training data, transfer learning has become an increasingly popular approach in many science and engineering domains. For classification problems, this work delves into the mystery of transfer learning through an intriguing phenomenon termed neural collapse (NC), where the last-layer features and classifiers of learned deep networks satisfy: (i) the within-class variability of the features collapses to zero, and (ii) the between-class feature means are maximally and equally separated. Through the lens of NC, our findings for transfer learning are the following: (i) when pre-training models, preventing intra-class variability collapse (to a certain extent) better preserves the intrinsic structures of the input data, so that it leads to better model transferability; (ii) when fine-tuning models on downstream tasks, obtaining features with more NC on downstream data results in better test accuracy on the given task. The above results not only demystify many widely used heuristics in model pre-training (e.g., data augmentation, projection head, self-supervised learning), but also leads to more efficient and principled fine-tuning method on downstream tasks that we demonstrate through extensive experimental results.

translated by 谷歌翻译

Is GPT-3 a Psychopath? Evaluating Large Language Models from a Psychological Perspective

Xingxuan Li , Yutong Li , Linlin Liu , Lidong Bing , Shafiq Joty

分类：自然语言处理 | 人工智能

2022-12-20

Are large language models (LLMs) like GPT-3 psychologically safe? In this work, we design unbiased prompts to evaluate LLMs systematically from a psychological perspective. Firstly, we test the personality traits of three different LLMs with Short Dark Triad (SD-3) and Big Five Inventory (BFI). We find all of them show higher scores on SD-3 than the human average, indicating a relatively darker personality. Furthermore, LLMs like InstructGPT and FLAN-T5, which are fine-tuned with safety metrics, do not necessarily have more positive personalities. They score higher on Machiavellianism and Narcissism than GPT-3. Secondly, we test the LLMs in GPT-3 series on well-being tests to study the impact of fine-tuning with more training data. Interestingly, we observe a continuous increase in well-being scores from GPT-3 to InstructGPT. Following the observations, we show that instruction-finetune FLAN-T5 with positive answers in BFI can effectively improve the model from a psychological perspective. Finally, we call on the community to evaluate and improve LLMs' safety systematically instead of at the sentence level only.

translated by 谷歌翻译

When Federated Learning Meets Pre-trained Language Models' Parameter-Efficient Tuning Methods

Zhuo Zhang , Yuanhang Yang , Yong Dai , Lizhen Qu , Zenglin Xu

分类：机器学习 | 自然语言处理

2022-12-20

With increasing privacy concerns on data, recent studies have made significant progress using federated learning (FL) on privacy-sensitive natural language processing (NLP) tasks. Much literature suggests fully fine-tuning pre-trained language models (PLMs) in the FL paradigm can mitigate the data heterogeneity problem and close the performance gap with centralized training. However, large PLMs bring the curse of prohibitive communication overhead and local model adaptation costs for the FL system. To this end, we introduce various parameter-efficient tuning (PETuning) methods into federated learning. Specifically, we provide a holistic empirical study of representative PLMs tuning methods in FL. The experimental results cover the analysis of data heterogeneity levels, data scales, and different FL scenarios. Overall communication overhead can be significantly reduced by locally tuning and globally aggregating lightweight model parameters while maintaining acceptable performance in various FL settings. To facilitate the research of PETuning in FL, we also develop a federated tuning framework FedPETuning, which allows practitioners to exploit different PETuning methods under the FL training paradigm conveniently. The source code is available at \url{https://github.com/iezhuozhuo/FedETuning/tree/deltaTuning}.

translated by 谷歌翻译

Let's Negotiate! A Survey of Negotiation Dialogue Systems

Haolan Zhan , Yufei Wang , Tao Feng , Yuncheng Hua , Suraj Sharma , Zhuang Li , Lizhen Qu , Gholamreza Haffari

分类：自然语言处理

2022-12-18

Negotiation is one of the crucial abilities in human communication, and there has been a resurgent research interest in negotiation dialogue systems recently, which goal is to empower intelligent agents with such ability that can efficiently help humans resolve conflicts or reach beneficial agreements. Although there have been many explorations in negotiation dialogue systems, a systematic review of this task has to date remained notably absent. To this end, we aim to fill this gap by reviewing contemporary studies in the emerging field of negotiation dialogue systems, covering benchmarks, evaluations, and methodologies. Furthermore, we also discuss potential future directions, including multi-modal, multi-party, and cross-cultural negotiation scenarios. Our goal is to provide the community with a systematic overview of negotiation dialogue systems and to inspire future research.

translated by 谷歌翻译

Modeling Global Distribution for Federated Learning with Label Distribution Skew

Tao Sheng , Chengchao Shen , Yuan Liu , Yeyu Ou , Zhe Qu , Jianxin Wang

分类：机器学习 | 计算机视觉

2022-12-17

Federated learning achieves joint training of deep models by connecting decentralized data sources, which can significantly mitigate the risk of privacy leakage. However, in a more general case, the distributions of labels among clients are different, called ``label distribution skew''. Directly applying conventional federated learning without consideration of label distribution skew issue significantly hurts the performance of the global model. To this end, we propose a novel federated learning method, named FedMGD, to alleviate the performance degradation caused by the label distribution skew issue. It introduces a global Generative Adversarial Network to model the global data distribution without access to local datasets, so the global model can be trained using the global information of data distribution without privacy leakage. The experimental results demonstrate that our proposed method significantly outperforms the state-of-the-art on several public benchmarks. Code is available at \url{https://github.com/Sheng-T/FedMGD}.

translated by 谷歌翻译

Instance-specific Label Distribution Regularization for Learning with Label Noise

Zehui Liao , Shishuai Hu , Yutong Xie , Yong Xia

分类：计算机视觉

2022-12-16

Modeling noise transition matrix is a kind of promising method for learning with label noise. Based on the estimated noise transition matrix and the noisy posterior probabilities, the clean posterior probabilities, which are jointly called Label Distribution (LD) in this paper, can be calculated as the supervision. To reliably estimate the noise transition matrix, some methods assume that anchor points are available during training. Nonetheless, if anchor points are invalid, the noise transition matrix might be poorly learned, resulting in poor performance. Consequently, other methods treat reliable data points, extracted from training data, as pseudo anchor points. However, from a statistical point of view, the noise transition matrix can be inferred from data with noisy labels under the clean-label-domination assumption. Therefore, we aim to estimate the noise transition matrix without (pseudo) anchor points. There is evidence showing that samples are more likely to be mislabeled as other similar class labels, which means the mislabeling probability is highly correlated with the inter-class correlation. Inspired by this observation, we propose an instance-specific Label Distribution Regularization (LDR), in which the instance-specific LD is estimated as the supervision, to prevent DCNNs from memorizing noisy labels. Specifically, we estimate the noisy posterior under the supervision of noisy labels, and approximate the batch-level noise transition matrix by estimating the inter-class correlation matrix with neither anchor points nor pseudo anchor points. Experimental results on two synthetic noisy datasets and two real-world noisy datasets demonstrate that our LDR outperforms existing methods.

translated by 谷歌翻译